
V 



I, Tetsuo YAMA^6^^^,p^^,«^ 



residing at 7 -4 Kamisaginomiya 2 -chome, Nakano-ku, Tokyo, Japan, 

Being competent in the Japanese and English language, certify 
that to the best of my knowledge and belief the attached English 
translation is a true and faithful translation made by me of 
Japanese Patent Application No. 11-246393 filed on August 31, 



Dated: October 30, 2003 



1999 . 



RECEIVED 

NOV 0 4 Z003 
Technology Center 2600 




Tetsuo YAMATO 




[Document Name] 
[Reference Number] 
[Filing Date] 
[Addressee] 



Patent Application 
54P0080 

August 31. 1999 

Commissioner of Patent Office 



[international Patent Class] GIOL 7/08 

GIOL 9/18 

[inventor] 

[Address] c/o Kawagoe Kou jou , Pioneer Corporation , 

25-1 Aza Nishimachi, Ooaza Yamada, Kawagoe-shi, Saitama-ken, 
Japan 

[Name] Shoutarou YODA 

[Patent Applicant] 
[identified Number] 
[Name] 
[Agent] 

[identified Number] 
[Patent Attorney] 
[Name] 

[indication of Charge] 
[Book Number] 
[Sum of Money] 
[List of Exhibits] 
[Document] 
[Document] 
[Document] 
[Proof] 



000005016 

Pioneer Corporation 

100063565 

Nobukiyo KOBASHI 

011659 
21000 

Specification 1 
Drawings 1 
Abstract 1 
Need 



[Document Name] Specification 

[Title of the Invention] SPEECH RECOGNITION SYSTEM 
[scope of the Claimed Invention] 

[claim l] A speech recognition system comprising: 
a plurality of voice pickup means for picking up uttered 
voices ; 

determination means for determining a speech signal 
suitable for speech recognition from speech signals output from 
said plurality of voice pickup means; and 

speech recognition means for performing speech recognition 
based on said speech signal determined by said determination 
means . 

[Claim 2] The speech recognition system according to 
claim 1, wherein said determination means acquires an average 
S/N value and average voice power of each of said speech 
signals output from said plurality of voice pickup means and 
determines that of said speech signal whose average S/N value 
and average voice power are greater than respective 
predetermined threshold values as said speech signal suitable 
for speech recognition. 

[claim 3] The speech recognition system according to 
claim 2, wherein said determination means determines a 
candidate order of those speech signals whose average S/N 
values and average voice powers are greater than said 
respective predetermined threshold values and which are 
candidates for said speech signal suitable for speech 
recognition, in accordance with said average S/N values and 



average voice powers; and 

said speech recognition means sequentially executes speech 
recognition on said candidates in accordance with said 
candidate order from a highest candidate to a lower one. 

[claim 4] The speech recognition system according to 
any one of claims 1 to 3, wherein said determination means 
treats those of said speech signals which are other than said 
speech signal suitable for speech recognition as noise signals. 

[claim 5] The speech recognition system according to 
any one of claims 1 to 4. wherein of other speech signals than 
said speech signal suitable for speech recognition, that speech 
signal whose average S/N value and average voice power become 
minimum is treated as a noise signal by said determination 
means . 

[Detailed Description of Invention] 
[0001] 

[Technical Filed of Invention] 
The present invention relates to a speech recognition 
system capable of allowing electronic equipments to be 
controlled or manipulated with uttered voices or speeches. 

[0002] 
[prior Art] 

Conventionally, as speech recognition systems of this type, 
ones adapted to electronic equipments, such as an on-board 
audio system and an on-board navigation system are known. 
[0003] 

In an on-board audio system equipped with a speech 
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recognition system, when a passenger says the name of a desired 
radio broadcasting station, for example, the speech recognition 
system recognizes the uttered speech and automatically tunes to 
the reception frequency of the radio broadcasting station based 
on the recognition result. This improves the operability of 
the on-board audio system and makes it easier for a passenger 
to use the on-board audio system. 
[0004] 

Also, this speech recognition system also has other 
capabilities that relieve a passenger of the burden of 
operating an MD (Mini Disc) player and/ or CD (Compact Disc) 
player. When the passenger loads an information -carrying 
recording/ reproducing medium, such as an MD disc, into the MD 
player and says the title of a musical piece recorded on that 
recording/reproducing mediim, for example, the speech 
recognition system recognizes the uttered speech and 
automatically plays the selected musical piece. 
[0005] 

An on-board navigation system equipped with a speech 
recognition system is provided with a capability of recognizing 
a speech uttered by a driver or the like to specify the name of 
the destination and displaying a map showing the route from the 
present location to the destination. This capability allows 
the driver to, concentrate on driving a vehicle, thus ensuring 
safer driving environments. 
[00O6] 

[problems to be Solved by the invention] 
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By the way, the above -described conventional speech 
recognition systems are designed to cope with a single person 
who utters words of instructions. The conventional speech 
recognition systems therefore have only a single microphone for 
inputting speeches provided at a location nearest to a driver 
who is very likely to use the microphone, 
[0007] 

Thus, other passengers who are seated far from the 
microphone should therefore utter large voices toward the 
microphone to secure a sufficient input voice level. To 
improve the speech recognition precision of such a speech 
recognition system, other passengers than the driver should 
also utter large voices toward the microphone to input uttered 
speeches into the microphone without being affected by noise in 
a vehicle. 

[OOO8] 

The present invention was made through considering these 
problems. Accordingly, it is an object of the present 
invention to provide a speech recognition system which has an 
improved operability. 
[0009] 

[Means for Solving by Problems] 

To achieve the above-mentioned object, according to the 
present invention, there is provided a speech recognition 
system which comprises a plurality of voice pickup means for 
picking up uttered voices, a deteimination means for 
determining a speech signal suitable for speech recognition 

- 4 - 



from speech signals output from the plurality of voice pickup 
means, and a speech recognizer for performing speech 
recognition based on the speech signal determined by the 
determination means, 

[oolo] 

It is preferable that the determination means acquires an 
average S/N value and average voice power of each of the speech 
signals output from the plurality of voice pickup means and 
determines that of the speech signal whose average S/N value 
and average voice power are greater than respective 
predetermined threshold values as the speech signal suitable 
for speech recognition. 

[ooiil 

It is a characteristic of the present invention that the 
determination means determines a candidate order of those 
speech signals whose average S/N values and average voice 
powers are greater than the respective predetermined threshold 
values and which are candidates for the speech signal suitable 
for speech recognition, in accordance with the average S/N 
values and average voice powers; and the speech recognizer 
sequentially executes speech recognition on the candidates in 
accordance with the candidate order from a highest candidate to 
a lower one. 
[0012] 

Also, it is a characteristic of the present invention that 
the determination means treats those of the speech signals 
which are other than the speech signal suitable for speech 
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recognition as noise signals. 
[0013] 

In any of the speech recognition system and their 
preferable modes, of other speech signals than the speech 
signal suitable for speech recognition, that speech signal 
whose average S/N value and average voice power become minimum 
may be treated as a noise signal by the determination means. 
[0014] 

According to the above structures, when a speaker makes a 
desired speech, a speech signal and a noise signal suitable for 
speech recognition are automatically determined from the 
individual speech signals output from a plurality of voice 
pickup means (or voice pickup means) and speech recognition is 
carried out based on the determined speech signal and noise 
signal. Accordingly, the speaker has only to utter words or 
voices without consciously making such a speech to a specific 
voice pickup means. This leads to an improved operability of 
the speech recognition system. 
[0015] 

[preferred Embodiment of the Invention ] 

With reference to the accompanying drawings, a description 
will now be given of a preferred embodiment of the present 
invention as adapted to a speech recognition system which can 
ensure voice- or speech-based control or manipulation of an 
electronic equipment installed in a vehicle, such as an on- 
board audio system or an on-board navigation system. 
[00I6] 
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FIG. 1 is a block diagram illustrating the structure of a 
speech recognition system according to this embodiment of this 
invention. Referring to this diagram, the speech recognition 
system comprises a plurality of microphones to as voice 
pickup means, a plurality of pre-circuits CCi to CC^, a 
multiplexer 1, an A/D (Analog- to -Digital) converter (ADC) 2, a 
demultiplexer 3, a storage section 4, a speech detector 5, a 
data analyzer 6, a speech recognizer 7, a controller 8 and a 
speech switch 9. 

[oolv] 

The pre-circuits CCi-CCn' the multiplexer 1, the A/D 
converter 2, the demultiplexer 3, the storage section 4, the 
speech detector 5, the data analyzer 6 and the controller 8 
constitute determination means which determines a speech signal 
and noise signal suitable for speech recognition. 
[0018] 

The single speech switch 9 is provided in the vicinity of 
a driver seat, for example, on a front dash board or one end of 
a front door by the driver seat. 
[0019] 

The controller 8 has a microprocessor (MPU), which 
controls the general operation of this speech recognition 
system. When the speech switch 9 is switched on, sending an ON 
signal SW to the microprocessor, the microprocessor causes the 
microphones to initiate a voice pickup operation. The 

speech detector 5 has number- of -speeches counters FCi-FCn that 
are used to determine to which microphone an uttered speech is 



directed, though their details will be given in a later 
description of the operation of the speech recognition system. 
[0020] 

The individual microphones M^-Mn are provided at locations 
where it is easy to pick up speeches uttered by individual 
passengers, e.g., in the vicinity of the individual passenger 
seats including the driver seat. 

[0021] 

In one example where four microphones M^-M^ are placed in 
a 4 -seat vehicle, the microphones Mi and M2 are placed in front 
of the driver seat and the front passenger seat and the 
microphones M3 and are placed in front of the rear passenger 
seats, e.g., the corresponding roof portions or at the back of 
the driver seat and the front passenger seat as shown in a plan 
view of FIG. 2(A). This way, the individual microphones M1-M4 
are associated with the respective passengers. 

[0022] 

In another example as shown in a plan view of FIG. 2(B), 
the microphones and M2 may be placed in the front door by the 
driver seat and the front door by the front passenger seat and 
the microphones M3 and M4 are placed in the rear doors by the 
respective rear passenger seats, so that the individual 
microphones M1-M4 are associated with the respective passengers. 

[0023] 

In a further example, the microphones M1-M4 may be 
provided at combined locations shown in FIGS. 2(A) and 2(B). 
Specifically, the microphone is placed in front of the 
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driver seat as shown in FIG. 2(A) or in the front door by the 
driver seat as shown in FIG. 2(B), so that a single microphone 
is provided for the driver who sits on the driver seat. 
Likewise, either the location shown in FIG. 2(A) or the 
location shown in FIG. 2(B) is selected for any of the 
remaining microphones M2-M4. 
[0024] 

In the case of a wagon type vehicle or the like which 
holds a greater niamber of seats, for example, a greater ntimber 
of microphones M^-M^ are provided in accordance with the seats 
and at the locations where it is easy to pick up speeches 
uttered by individual passengers, as shown in plan views of 
FIGS. 3(A) and 3(B). Note that the microphones M^-M^ may be 
provided at combined locations shown in FIGS. 3(A) and 3(B) as 
per the aforementioned case of the 4 -seat vehicle. 

[0025] 

Moreover, it is to be noted that the aforementioned 
microphone layouts have been given simply as examples, and are 
to be considered as illustrative and not restrictive. Actually, 
system information that is used in the speech recognition 
system of this invention is constructed beforehand in 
consideration of the characteristics of voice transmission from 
individual passengers to the respective microphones. Strictly 
speaking, therefore, the conditions for setting the microphones 
are not restricted at all. Further, the nxamber of microphones 
can be deternained to be equal to or smaller than the number of 
maximum passengers predetermined in accordance with the type of 
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a vehicle. 
[0026] 

Also, the layout of the individual microphones is not 
limited to a simple layout that makes the distances between the 
microphones to the respective passengers equal to one another. 
Those distances and the locations of the individual microphones 
may be determined based on the results of analysis of the voice 
characteristics in a vehicle previously acquired through 
experiments or the like in such a way that the characteristics 
of voice transmission from the microphones to the respective 
passengers become substantially the same. 

[0027] 

Returning to FIG. 1 again, the microphones Mi-^, are 
connected to the respective pre-circuits CCi-CCn, thus 
constituting N channels of signal processing systems. 

[0028] 

Each of the pre-circuits CCi-CCj, has an amplifier (not 
shown) which amplifies the amplitude level of the associated 
one of input speech signals to Sjj, supplied from the 
microphones Mi-Mf,, to the level that is suitable for signal 
processing, and a band-pass filter (not shown) which passes 
only a predetermined frequency component of the amplified input 
speech signal. The pre-circuits CCi-CC„ supply input speech 
signals Si' to S^* , which have passed the respective band-pass 
filters, to the multiplexer 1. 

[0029] 

Each band-pass filter is set with a low cut-off frequency 
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(©-g-/ = 100 Hz) for eliminating low- frequency noise 
included in the associated one of the input speech signals S^-S^ 
and a high cut-off frequency in consideration of the Nyquist 
frequency. The low cut-off frequency fL and high cut-off 
frequency are set so that the frequency range of voices that 
human beings utter is included in the range between those two 
frequencies . 
[0030] 

As shown in FIG. 4, the multiplexer 1 comprises analog 
switches ASi to AS^ for N channels. The input speech signals 
Si'-S^' from the pre-circuits CC^-CC^ are supplied to the input 
terminals of the respective analog switches ASi-ASj, whose output 
terminals are connected together to the A/D converter 2. In 
accordance with channel switch signals CH^ to CH^ supplied from 
the controller 8, the analog switches AS^-ASj, exclusively switch 
the input speech signals Si'-S^' and supply the switched input 
speech signals S^'-S^' to the A/D converter 2. 

[003l] 

The A/D converter 2 converts the input speech signals ' - 
Sn' , sequentially supplied from the multiplexer 1, to digital 
input data D^ to D^ in synchronism with a predetermined sampling 
frequency f , and supplies the digital input data D^-D^ to the 
demultiplexer 3. 

[0032] 

The sampling frequency f is set by a sampling clock CKj^ 
from the controller 8 and is determined in consideration of 
anti-aliasing. More specifically, the sampling frequency f is 



determined to be equal to or higher than approximately twice 
the high cut-off frequency fa of the band-pass filter, and is 
set, for example, in a range of 8 kHz to 11 kHz. 
[0033] 

The demultiplexer 3 comprises analog switches AW^ to AW^, 
for N channels, as shown in FIG. 4. The analog switches AW^-AW^ 
have their input terminals connected together to the output 
terminal of the A/D converter 2 and their output terminals 
respectively connected to memory areas ME^ to ME^ for N channels 
provided in the storage section 4. In accordance with the 
channel switch signals CHi-CH« supplied from the controller 8, 
the analog switches AW^-AWn exclusively switch the input data 
Di-D„ and supply the switched input data D^-D^ to the respective 
memory areas ME^-ME^. 

[0034] 

Referring now to the timing chart in FIG. 5, the 
operations of the multiplexer 1, the A/D converter 2 and the 
demultiplexer 3 will be explained. When the speech switch 9 is 
set on, the resultant ON signal SW is received by the 
controller 8 which in turn outputs the sampling clock CK^x: and 
the channel switch signals CHi-CH„. 

[0035] 

The sampling clock CK^ has a pulse waveform which repeats 
the logical inversion N times during a period (sampling period) 
T which is the reciprocal, 1/f , of the sampling frequency f . 
The channel switch signals CHi-CH„ have pulse waveforms which 
sequentially become logic "1" every period T/N of the sampling 
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clock CK;^. 
[0036] 

The multiplexer 1 exclusively performs switching between 
enabling and disabling of the input speech signals Si'-S^' in 
synchronism with the period T/N in which the channel switch 
signals CHi-CH^ sequentially become logic "1". As a result, the 
input speech signals Si'-S^' are sequentially supplied to the 
A/D converter 2 in synchronism with the period T/N to be 
converted to the digital data Di-D^. The demultiplexer 3 
likewise exclusively performs switching between enabling and 
disabling of the input data D^-D^ in synchronism with the period 
T/N in which the channel switch signals CHi-CHn sequentially 
become logic "1". Accordingly, the input data D^-Dn from the 
A/D converter 2 are distributed and stored in the respective 
memory areas MEi-MEj^ in synchronism with the period T/N. 

[0037] 

As sampling N channels of input speech signals Si'-S^' in 
the sampling period T (= 1/f) is repeated in this way, it is 
possible to generate N channels of input data D^-D^ with even 
the single A/D converter 2 in synchronism with the sampling 
frequency f and to store the input data Di-D^ into the 
predetermined memory areas MEi-MEj,, respectively. 

[0038] 

The storage section 4, which is constituted by a 
semiconductor memory, has the aforementioned memory areas MEi- 
MEj, for N channels. That is, the memory areas MEi-MEjj are 
provided in association with the microphones Mi-M^. 



[0039] 

As shown in FIG. 4, each of the memoary areas MEi-ME^ has a 
plurality of frame areas MFi, MF2 and so forth for storing the 
associated one of the input data Di-D^, frame by frame of a 
predetermined number of samples. 

[0040] 

Referring to the memory area ME^, for example, the frame 
areas MF^, MFj and so forth sequentially store the input data 
supplied from the demultiplexer 3 by a predetermined number of 
samples (256 samples in this embodiment) in accordance with an 
address signal ADR^ from the controller 8. That is, every 256 
samples of the input data are stored in each frame area MFi, 
MF2 or the like in each frame period TP which is 256 x T as 
shown in FIG. 5. Input data for one frame period (ITF), which 
is stored in each frame area MF^, MFj or the like, is called 
"frame data". 

[004l] 

Likewise, the input data D^-D^^ are stored, 256 samples 
each, in the frame eurea MF^, MFj and so forth in the remaining 
memory areas ME2-MEN in each frame period (ITF). 

[0042] 

The speech detector 5 and the data analyzer 6 are 
constituted by a DSP (Digital Signal Processor) . 
[0043] 

Every time frame data is stored in the frame area MF^, MF2 
and so forth in each of the memory areas MEi-MEn, the speech 
detector 5 computes the LPC (Linear Predictive Coding) residual 
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of the latest frame data and deteimines if the computed value 
is equal to or greater than a predetermined threshold value 
THDl* When the computed value becomes equal to or greater than 
the predetermined threshold value THDl, the speech detector 5 
determines that the latest frame data is speech frame data 
produced from a speech. V/hen the computed value is smaller 
than the predetermined threshold value THDl, the speech 
detector 5 determines that the latest frame data is input data 
that has not been produced from a speech, i.e., noise frame 
data that has been produced by noise in a vehicle. 
[0044] 

When the computed LPC residual value becomes equal to or 
greater than the predetermined threshold value THDl over three 
frame periods (3TF), the speech detector 5 settles that the 
frame data over the three frame periods (3TF) is definitely 
speech frame data produced from a speech and transfers speech 
detection data DCTl indicative of the result of the decision to 
the controller 8. 

[0045] 

More specifically, the LPC residuals of frame data stored 
in the individual frame area MF^, MF2 and so forth in each of 
the memory areas ME^-MEn are individually computed channel by 
channel, and each channel -by- channel computed LPC residual 
value is compared with the threshold value THDl to determine, 
channel by channel, if the frame data is speech frame data 
produced from a speech. 

[0046] 
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In other words , given that s ^ is the computed LPC 
residual value of the first channel associated with the 
microphone M^, £ 2 is the computed LPC residual value of the 
second channel associated with the microphone and likewise £ 
3 to are the computed LPC residual values of the third to N- 
th channels respectively associated with the microphones Ma-M^, 
the computed values £ 1- £ n ^® compared with the threshold 
value THDl. The frame data that corresponds to the channel 
whose computed LPC residual value becomes equal to or greater 
than the threshold value THDl is determined as speech frame 
data that has been generated from a speech. Further, the 
speech frame data that corresponds to the channel whose 
computed LPC residual value becomes equal to or greater than 
the threshold value THDl over three frame periods (3TF) is 
settled as speech frame data that is definitely generated from 
a speech. 

[0047] 

When a speech has been directed to the microphone and 
the uttered voices have not been input to the remaining 
microphones M^-M^, , for example, only the frame data that is 
stored in the memory area ME^ of the channel associated with 
the microphone is determined and settled as speech frame 
data that has been produced from the speech, and the frame data 
stored in the memory areas ME2-MEJ, associated with the remaining 
microphones l^-l^ are determined as noise frame data generated 
from noise in the vehicle. 

[0048] 
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Also, when a speech has been directed to the microphone Mi 
and the uttered voices have reached the microphone M2 but not 
the remaining microphones Ma-^, for example, the frame data 
stored in the memory areas ME^ and MEj of the channels 
associated with the microphones cind are both determined 
and settled as speech frame data produced from the speech, and 
the frame data stored in the memory areas MEa-ME^ associated 
with the remaining microphones are determined as noise 

frame data. 

[0049] 

In the above -described manner, the speech detector 5 
computes the LPC residual of each of the frame data stored in 
the memory areas ME^-MEn, compares it with the threshold value 
THDl to determine if uttered voices have been input to any 
microphone and determine the frame period in which the uttered 
voices have been input, and transfers the speech detection data 
DCTl having information on those decisions to the controller 8. 

[0050] 

Moreover, the speech detection data DCTl is transferred to 
the controller 8 as predetermined code data which indicates the 
memory area where speech frame data has been stored over the 
aforementioned three frames or more (hereinafter this memory 
area will be called "speech memory channel") and its frame area 
(hereinafter called "speech memory frame"). 

[005l] 

Specifically, the speech detection data DCTl has an 
ordinary data structure of, for example, DCTl{CHi(TFi, TF2-TF^) , 
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CH2(TFi, TF2-TF^), CH^CTFi, TF2-TF^)}. CH^-C^, are flag data 

representing the individual channels, and TF^, TF2-TF^ are flag 
data corresponding to the individual frame areas MFi, MFj-MF^. 
[0052] 

When an uttered speech is input only to the microphone 
and speech frame data is stored in the third and subsequent 
frame areas MF3, MF4 and so forth, speech detection data DCT of 

binary codes of DCT1{1(0,0,1,1-1) , 0(0,0,0-0) 0(0,0,0-0) 

is transferred to the controller 8. 

[0053] 

When the speech detection data DCTl is transferred to the 
controller 8, the controller 8 generates control data CNTl 
indicating the speech memory channel and speech memory frame 
based on the speech detection data DCTl, and sends the control 
data CNTl to the data analyzer 6 

[0054] 

The data analyzer 6 comprises an optimal- speech 
determining section 6a, a noise determining section 6b, an 
average-S/N computing section 6c, an average-voice-power 
computing section 6d, an average -noise -power computing section 
6e, a speech condition table 6f and a noise selection table 6g. 
When receiving the control data CNTl from the controller 8, the 
data analyzer 6 initiates a process of determining speech frame 
data and noise frame data suitable for speech recognition. 

[0055] 

The average -voice -power computing section 6d acquires 
information on the speech memory channel and speech memory 
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frame from the control data CNTl, reads speech frame data from 
the memory area that corresponds to those speech memory channel 
and speech memory frame and computes average voice power P(n) 
of the speech frame data channel by channel. The variable n in 
the average voice power P(n) indicates a channel number. 
[0056] 

When speech frame data is stored in the memory areas MEi- 
ME4 corresponding to the channels CH^^-CH^ as shown in FIGS. 6(A) 
to 6(D), for example, the average voice power P(l) to P(4) of 
plural pieces of speech frame data corresponding to a plxrrality 
of predetermined frame periods (m2 x TF) from a time tg at which 
a speech has started are computed channel by channel. The 
average voice power P(n) is computed by obtaining the sum of 
squares of speech frame data in the frame periods (m2 x TF) and 
then dividing the sum by the number of the frame periods (m2 x 
TF) . 

[0057] 

The average -noise -power computing section 6e acquires 
information on the speech memory channel and speech memory 
frame from the control data ClsrTl, reads noise frame data 
preceding the speech frame data by a plinrality of frame periods 
(m^ X TF) from the memory area that corresponds to those speech 
memory channel and speech memory frame and computes average 
noise power NP(n) of the noise frame data channel by channel. 
The variable n in the average noise power NP(n) indicates a 
speech channel, and the average noise power NP(n) is computed 
by obtaining the sum of squares of noise frame data in the 



frame periods {m^ x TF) and then dividing the sum by the niamber 
of the frame periods (m^ x TF) . 
[0058] 

When speech frame data is stored in the memory areas MEi- 
ME4 corresponding to the channels CH1-CH4 as shown in FIGS. 6(A) 
to 6(D), for example, the average noise power NP{n) of plural 
pieces of noise frame data preceding by a plurality of frame 
periods (m^ x TF) from the time t^ at which a speech has started 
(at which storage of the speech frame data has started) are 
computed . 

[0059] 

The average -S/N computing section 6c computes an average 
S/N value SN(n) which represents the value of the signal-to- 
noise ratio for each speech channel based on the average voice 
power P(n) computed by the average -voice -power computing 
section 6d and the average noise power NP(n) computed by the 
average-noise-power computing section 6e. 

[0O60] 

For example, in the case where the channels CHj^-CH^ are 
speech channels as shown in FIGS. 6(A) to 6(D), the average S/N 
values SN(1) to SN(4) of the individual channels CH1-CH4 are 
computed from the following equations 1 to 4. 

[0O6I] 

SN(1) = P(1)/NP(1) ... (1) 

SN(2) - P(2)/NP(2) ... (2) 

SN(3) = P(3)/NP(3) ... (3) 

SN(4) = P(4)/NP(4) ... (4) 
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Logarithmic values of the average S/N values SN(1) to 
SN(4) computed from the equations 1 to 4 may be taken as the 
average S/N values SN(1)-SN(4) of the individual channels CHi- 
CH4. 

[0O62] 

The optimal -speech determining section 6a compares the 
average S/N value SN(n) acquired by the average-S/N computing 
section 6c with a predetermined threshold value THD2, and 
compares the average voice power P(n) acquired by the average - 
voice-power computing section 6d with a predetermined threshold 
value THD3. The optimal- speech determining section 6a then 
collates the results of the comparison with the speech 
condition table 6f shown in FIG. 7 to determine which channel 
of speech frame data is suitable for the speech recognition 
process . 

[0O63] 

In the other words, as shown in FIG. 7, the speech 
condition table 6f is storing reference data for ranking speech 
frame data in accordance with the relationship between the 
average S/N value and the threshold value THD2 and the 
relationship between the average voice power and the threshold 
value THD3. Referring to the speech condition table 6f based 
on the comparison results, the optimal -speech determining 
section 6a ranks the speech frame data suitable for speech 
recognition and determines the speech frame data of the highest 
rank as the one suitable for speech recognition. 

[0064] 
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Specifically, the optimal -speech determining section 6a 
determines the speech frame data whose average S/N value is 
equal to or greater than the threshold value THD2 and whose 
average voice power is equal to or greater than the threshold 
value THD3 as a rank 1 (Rnkl), determines the speech frame data 
whose average S/N value is equal to or greater than the 
threshold value THD2 and whose average voice power is less than 
the threshold value THD3 as a rank 2 (Rnk2), determines the 
speech frame data whose average S/N value is smaller than the 
threshold value THD2 and whose average voice power is equal to 
or greater than the threshold value THD3 as a rank 3 {Rnk3), 
and determines the speech frame data whose average S/N value is 
smaller than the threshold value THD2 and whose average voice 
power is less than the threshold value THD3 as a rank 4 (Rnk4). 

[0065] 

Further, the optimal -speech determining section 6a 
determines the speech frame data in all the channels of speech 
frame data whose average S/N value and average voice power 
become maximum as a rank 0 ( RnkO ) . 

[0066] 

Then, the optimal -speech determining section 6a determines 
the speech frame data that becomes the rank 0 (RnkO) as a 
candidate most suitable for speech recognition (first 
candidate). Further, the optimal -speech determining section 6a 
determines the speech frame data that becomes the rank 1 (Rnkl) 
as the next candidate suitable for speech recognition (second 
candidate). When there are a plurality of channels whose 

- 22 - 



speech frame data become the rank 1 (Rnkl), those speech frame 
data which have greater average S/N values and greater average 
voice powers are determined as candidates of higher ranks. 
[0067] 

Further, the optimal -speech determining section 6a removes 
the speech frame data that correspond to the rank 2 (Rnk2) to 
the rank 4 (Rnk4) from the targets for speech recognition, 
considering that they are unsuitable for speech recognition. 

[0068] 

In short, the optimal -speech determining section 6a 
compares the average S/N value SN(n) and the average voice 
power P(n) with the threshold values THD2 and THD3 respectively, 
collates the comparison results with the speech condition table 
6f shown in FIG. 7 to determine the speech frame data that is 
suitable for speech recognition, and then puts a priority order 
or ranking to speech frame data suitable for speech recognition. 
Then, the optimal -speech deteimining section 6a transfers 
speech candidate data DCT2 indicating the ranking to the 
controller 8. 

[0069] 

The noise determining section 6b collates combinations of 
all the ranks for N channels that are acquired by the optimal - 
speech determining section 6a with the noise selection table 6g 
shown in FIG. 8, and determines any channel for which the 
ranking combination has a match as a noise channel. 

[0070] 

When the ranks of the individual channels starting at the 
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first channel CHi are (RnkO), (Rnkl), {Rnk2), (Rnkl), for 
example, the noise determining section 6b determines the third 
channel CH3 as a noise channel. Then, the noise determining 
section 6b sends noise candidate data DCT3 to the controller 8. 
[007l] 

Therefore, when the optimal -speech determining section 6a 
determines a candidate of speech frame data suitable for speech 
recognition, the noise determining section 6b determines a 
noise channel corresponding to the candidate of speech frame 
data suitable for speech recognition by referring to the 
individual "cases" in FIG. 8. Accordingly, a candidate of 
speech frame data suitable for speech recognition and noise 
data obtained by the microphone that has picked up noise are 
determined in association with each other. 

[0072] 

Moreover, the individual cases 1, 2, 3 and so forth in the 
noise selection table 6g in FIG. 8 are preset based on the 
results of experiments on the voice characteristics obtained 
when passengers actually uttered voices at various positions in 
a vehicle in which all the microphones M^-N^ were actually 
installed. 

[0073] 

Thus, when the speech candidate data DCT2 and the noise 
candidate data DCT3 are supplied to the controller 8, the 
controller 8 accesses that of the memory areas MEi-ME,^ which 
corresponds to the channel of the first candidate based on the 
speech candidate data DCT2, reads the speech frame data most 
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suitable for speech recognition and supplies it to the speech 
recognizer 7. 
[0074] 

The speech recognizer 7 performs known processes, such as 
SS (Spectrum Subtraction), echo canceling, noise canceling and 
CMN, based on the speech frame data and noise frame data 
supplied from the storage section 4 to thereby eliminate a 
noise component from the speech frame data, performs speech 
recognition based on the noise -component removed speech frame 
data and outputs data Dout representing the result of speech 
recognition . 

[0075] 

Moreover, if an adequate speech recognition result is not 
acquired from the speech recognition performed by the speech 
recognizer 7 based on speech frame data and the noise frame 
data suitable for speech recognition, the controller 8 accesses 
the memory area that corresponds to the channel of the next 
candidate suitable for speech recognition and transfers the 
corresponding speech frame data to the speech recognizer 7 . 
Thereafter, the controller 8 supplies speech frame data of the 
channels of subsequent candidates in order to the speech 
recognizer 7 until the adequate speech recognition result is 
acquired . 

[0076] 

Next, an example of the operation of this speech 
recognition system which has the above -described structure will 
be discussed with reference to the flowcharts shown in FIGS. 9 



and 10. Moreover, FIG. 9 illustrates an operational sequence 
from the pickup of sounds with the microphones Mi-M^ to the 
storage of the input data D^-D^ into the storage section 4 as 
frame data, and FIG. 10 illustrates the operation at the time 
the data analyzer 6 determines optimal speech frame data and 
noise frame data. 
[0077] 

In FIG. 9, the speech recognition system stands by until 
the speech switch 9 is switched on in step 100. Upon 
occurrence of the ON event of the speech switch 9, the flow 
goes to step 102 to perform initialization. This initializing 
process clears a count value n of a channel -number counter, a 
count value m of a frame-number counter and all values F(l) to 
F(N) of the number -of -speeches counters FCi-FCp,, all provided in 
the controller 8. 

[0078] 

Moreover, the channel -number counter is provided to 
designate each of the channels of the microphones M^-M^ with the 
count value n. The frame -number counter is provided to 
designate the number (address) of each of the frame areas MF^, 
MF2, MF3 and so forth, provided in the each of the memory areas 
MEi-MEf,, with the count value m. 

[0079] 

N number -of -speeches counters FCi-FCf, are provided in 
association with the individual channels. That is, the first 
number -of -speeches counter FC^ is provided in association with 
the first channel, the second number- of -speeches counter FC2 is 



provided in association with the second channel, and so forth 
to the N-th nvimber- of -speeches counter FC^ provided in 
association with the N-th channel. The number -of -speeches 
counters FCi-FCV, are used to determine whether or not an LPC 
residual 6^ greater than the threshold value THDl has 
consecutively continued over three or more frames and to 
determine the channel for which the LPC residual 8^ has 
continued over three or more frames. The number- of -speeches 
counters FCi-FC^ ^® also used to determine, as a speech- input 
channel , the channel for which the LPC residual £ „ has 
continued over three or more frames. 
[OO8O] 

In the next step 104, the first frame area MF^ of each of 
the memory areas ME^-MEj, is set. That is, the number, m, of the 
frame area is set to m = 1 . 

[0O8I] 

In subsequent steps 106 and 108, the microphones 
start picking up sounds and the input data D^-D^ acquired by the 
voice pickup are stored in the individual first frame areas MF^ 
of the memory areas MEi-ME^, frame by frame. 

[0O82] 

When one frame of input data D^-D^ is stored, the memory 
area ME^ that corresponds to the first (n = 1) channel is 
designated in step 110, and the LPC residual (n = 1) of 
frame data stored in the first (m = 1) frame area MF^ of the 
memory area ME^ is computed in step 112. 

[0083] 
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Next, in step 114, the LPC residual s „ is compared with 
the threshold value THDl . Vftien £ n = THDl ' flow goes to 

step 116 to increment (or adds "1" to) the value F(l) of the 
number- of -speeches counter FC^ corresponding to the first 
channel by "1". When < THDl, the flow goes to step 118 to 
clear the value F(l) of the number- of -speeches counter FCi- 

[0084] 

When £ n becomes equal to or greater than THDl ( £ „ ^ 
THDl), therefore, the value F(l) of the number- of -speeches 
counter FC^ becomes "1" which indicates that one frame of 
speeches has been input to the microphone of the first 
channel . 

[0085] 

When £n becomes smaller than THDl ( £n < THDl), on the 
other hand, the value F(l) of the number -of -speeches counter 
FCi is cleared to "0" which indicates that no speeches have 
been input to the microphone of the first channel. 

[0086] 

Next, it is checked if n is equal to N (n = N) in step 120 
to determine whether the LPC residual £ „ in every channel has 
been computed. When n = N is not met, the flow goes to step 
122 to make n = n + 1 to set the next channel, and the sequence 
of processes from step 112 is repeated. That is, by repeating 
the processes of steps 112 to 122, the LPC residual £ „ of frame 
data stored in the frame area MF^ of each of the memory areas 
MEi-MEn is compared with the threshold value THDl. When the LPC 
residual £ „ becomes equal to or greater than the threshold 
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value THDl, the value F(n) of the nxomber- of -speeches counter 
FCi corresponding to that channel number n is incremented by 
"1". 

[0087] 

Next, when n = N is met in the aforementioned step 120, it 
is determined that the processing for all the channels has been 
completed, then the flow proceeds to step 124, 

[0088] 

In step 124, it is determined if any one of the values 
F(l) to F(N) of the number- of -speeches counters FC^-FCj^ has 
become equal to or greater than "3". If there is no such a 
count value, i.e., if any of the values F{1) to F(N) is equal 
to or smaller than "2", the flow goes to step 126. 

[0089] 

In step 126, the individual second frame areas MF2 of the 
memory areas MEi-ME^, 1 are set by setting m = m + 1. Then, the 
processes of steps 106 to 124 are repeated. 

[0090] 

Accordingly, the input data is stored in each frame area 
MF2 (steps 106 and 108), the LPC residual £ „ of each frame data 
stored in each frame area MF2 is compared with the threshold 
value THDl (steps 110 to 114), and each of the values F(l) to 
F(N) of the number -of -speeches counters FC^-FCn is incremented 
or cleared based on the comparison results. 

[0091] 

In step 124 again, it is determined again if any one of 
the values F(l) to F(N) of the number- of -speeches counters FCi- 
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FCn has become equal to or greater than "3". If there Is no 
such a count value, the flow goes to step 126 to set m = m + 1 
so that the next frame areas MF3 of the memory areas MEi-ME^, 1 
are set. Then, the processes of steps 106 to 124 are repeated. 
[0092] 

As the processes of steps 106 to 124 are repeated and at 
least one of the values F(l) to F(N) of the number -of -speeches 
counters FCi-FCn becomes equal to or greater than "3", the flow 
proceeds to step 128 . 

[0093] 

In other words, in step 124, the values F{1) to F(N) of 
the nimber-of-speeches counters FC^-FCn are checked and only 
when the LPC residual £^ greater than the threshold value THDl 
consecutively continues over three or more frames, frame data 
stored in the memory area corresponding to that channel is 
determined and settled as speech frame data. 

[0094] 

Next, in step 128, it is determined if the value of the 
number- of -speeches counter for which it was determined the LPC 
residual greater than the threshold value THDl consecutively 
continued over three or more frames has reached "5". If that 
value has not reached "5" yet, the process in step 126 is 
carried out after which the processes of steps 106 to 128 are 
repeated . 

[0095] 

Namely, there may be a case where when the value F(n) of 
the number- of -speeches counter that corresponds to a given 
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channel n becomes "3", the value of the number- of -speeches 
counters corresponding to the remaining channels Is "1" or "2". 
In this case, frame data stored in the memory areas 
corresponding to the remaining channels are likely to be also 
speech frame data. 
[0096] 

To cope with this case, therefore, the processes of steps 
106 to 128 are repeated twice to check if the frame data stored 
in the memory areas corresponding to the remaining channels are 
speech frame data. 

[0097] 

When the decision in step 128 is "YES", the flow goes to 
step 130 where the speech detection data DCTl which has 
information on the memory area where speech frame data is 
stored and the memory area where noise frame data is stored is 
transferred to the controller 8. The flow then proceeds to a 
routine illustrated in FIG. 10. 

[0098] 

When the operation goes to the routine illustrated in 
FIG. 10, the average voice power P(n), the average noise power 
NP(n) and the average S/N value SN(n) for each channel are 
computed first in step 200. Next, a candidate of speech frame 
data suitable for speech recognition is determined based on the 
speech condition table 6f shown in FIG. 7 in step 202. In the 
next step 204, noise frame data suitable for speech recognition 
is determined based on the noise selection table 6g shown in 
FIG. 8. 



[0099] 

In step 206, the speech candidate data DCT2 that Indicates 
the candidate of speech frame data suitable for speech 
recognition and the noise candidate data DCT3 that indicates 
the noise frame data are sent to the controller 8 from the data 
analyzer 6. In other words, the speech candidate data DCT2 and 
the noise candidate data DCT3 inform the controller 8 of the 
candidate of speech frame data suitable for speech recognition 
and noise frame data suitable for speech recognition associated 
with that candidate. 

[OlOO] 

Next, in step 208, the speech recognizer 7 read the speech 
frame data and noise frame data most suitable for speech 
recognition from the storage section 4, performs speech 
recognition on the read speech frame data and noise frame data, 
and terminates a sequence of speech recognition processes when 
an adequate speech recognition result is acquired. 

[OlOl] 

On the other hand, when no adequate speech recognition 
result is acquired, on the other hand, the speech recognizer 7 
checks in step 212 if there are next candidates of speech frame 
data and noise frame data, reads the next candidates of speech 
frame data and noise frame data, if present, from the storage 
section 4 and repeats the sequence of processes starting at 
step 208. When no adequate speech recognition result is 
obtained even after re-execution of the speech recognition, the 
speech recognizer 7 likewise reads next candidates of speech 
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frame data and noise frame data from the storage section 4 and 
repeats the sequence of processes in steps 208 to 212 until the 
adequate speech recognition result is obtained. 
[0102] 

According to this embodiment, as apparent from the above, 
a plurality of microphones M^-Mn for inputting voices are placed 
in a vehicle and speech frame data and noise frame data 
suitable for speech recognition are automatically extracted 
from those speech frame data and noise frame data that are 
picked up by the microphones and are subjected to speech 

recognition. This speech recognition system can therefore 
provide a plurality of speakers (passengers) with a better 
operability than the conventional speech recognition system 
that is designed for a single speaker. 

[0103] 

For example, when one of a plurality of passengers directs 
a desired speech to a certain microphone (e.g., Mj) , the 
uttered speech may generally be picked up by the other 
microphones (Mj-M^) so that it is difficult to determine which 
microphone has actually been intended to pick up the uttered 
speech. According to this embodiment, however, speech frame 
data and noise frame data suitable for speech recognition are 
automatically extracted by using the speech condition table 6f 
and the noise selection table 6g, respectively shown in FIGS. 7 
and 8, and speech recognition is carried out based on the 
extracted speech frame data and noise frame data. This makes 
it possible to associate the passenger who has made the speech 
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with the microphone (e.g., MJ close to that passenger with a 
very high probability. 
[0104] 

Accordingly, this speech recognition system automatically 
specifies a passenger who tries to perform a voice-based 
manipulation of an electronic equipment installed in a vehicle 
and allows the optimal microphone (close to the passenger) to 
pick up the uttered speech. This can improve the speech 
recognition precision. With the use of this speech recognition 
system, a passenger requires a special manipulation but merely 
needs to utter words to give this or her voiced instruction 
through the appropriate microphone, so that this speech 
recognition system is considerably easy to use. 

[0105] 

Further, suppose that while one or more passengers who do 
not intend to perform a voice-based manipulation of an on-board 
electronic equipment are making a conversation or the like, one 
person utters words to perform such a voice-based manipulation. 
Even in this case, the conversation or the like made by the 
passengers who are not performing the voice-based manipulation 
is determined as noise and eliminated from consideration by 
automatically extracting speech frame data and noise frame data 
suitable for speech recognition by using the speech condition 
table 6f and the noise selection table 6g, respectively shown 
in FIGS. 7 and 8, and then carrying out speech recognition 
based on the extracted speech frame data and noise frame data. 
This can provide a speech recognition system which is not 
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affected by a conversation or the like taking place in a 
vehicle around and which is very easy to use. 
[0106] 

Moreover, although this embodiment is provided with the 
single speech switch 9, shown in FIG. 1, which is switched on 
by, for example, a driver, this invention is not limited to 
this particular structure. For example, a plurality of 
microphones Ml-^^ may be respectively provided with speech 
switches TKi to TK^ as shown in the block diagram in FIG. 11, so 
that when one of the speech switches is set on, the controller 
8 allows the microphone that corresponds to the activated 
speech switch to pick up words and determines that the 
remaining microphones corresponding to the inactive speech 
switches have picked up noise in the vehicle. 

[0107] 

This modified structure can specify the microphone that 
has picked up an uttered speech and the microphones that have 
picked up noise before speech recognition. This can shorten 
the processing time for easily determining speech data and 
noise data most suitable for speech recognition. 

[0IO8] 

Further, the structure shown in FIG. 1 and the structxire 
shown in FIG. 11 may be combined as needed. Specifically, 
speech switches smaller in number than the microphones M^-Mn may 
be placed at adequate locations in a vehicle so that when one 
of the speech switches is set on, the controller 8 detects the 
event and initiates speech recognition. In this case, the 
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speech switches do not completely correspond one-to-one to the 
microphones Mi-M^. so that while speech recognition Is carried 
out with the structure shown In FIG, 1, the microphone that has 
picked up an uttered speech and the microphones that have 
picked up noise before speech recognition can be specified 
before speech recognition. This can shorten the processing 
time for determining speech data and noise data suitable for 
speech recognition, 
[0109] 

Further, In the case where the structure In FIG. 1 Is 
adapted to the case where speech switches smaller In number 
than the microphones Mi-M^ are provided, each speech switch may 
be determined as the layout range for the associated microphone 
or microphones and one or more microphones belonging to each 
layout range may be specified previously depending on which 
speech switch has been set on. With this structure, those 
which are suitable for speech recognition have only to be 
extracted from pre- specif led single or plural speech frame data 
and noise frame data, thus making It possible to shorten the 
processing time. 

[0110] 

In addition, although the foregoing description of this 
embodiment and modifications has been given of a speech 
recognition system adapted to an on-board electronic equipment, 
the speech recognition system of this invention can also be 
adapted to other types of electronic apparatuses, such as a 
general -purpose microcomputer system and a so-called word 



processor, to enable voice-based entry of sentences or voice- 
based document edition. 
[Olll] 

[Effect of the Invention] 

According to this invention, in short, when a speaker 
makes a desired speech, a speech signal and a noise signal 
suitable for speech recognition are automatically determined 
from the individual speech signals output from a plurality of 
voice pickup sections (or voice pickup means) and speech 
recognition is carried out based on the determined speech 
signal and noise signal. Accordingly, the speaker has only to 
utter words or voices without consciously making such a speech 
to a specif ic voice pickup section. This leads to an improved 
operability of the speech recognition system. 
[Brief Description of Drawings] 

FIG. 1 is a block diagram illustrating the structure of a 
speech recognition system according to one embodiment of the 
present invention; 

FIG. 2(A) is a plan view exemplifying the layout of 
microphones in an ordinary 4 -seat vehicle; 

FIG. 2(B) is a plan view showing another layout of 
microphones in an ordinary 4 -seat vehicle; 

FIG. 3(A) is a plan view exemplifying the layout of 
microphones in a wagon or the like; 

FIG. 3(B) is a plan view showing another layout of 
microphones in a wagon or the like; 

FIG. 4 is a block diagram showing the structures of a 



multiplexer, a demultiplexer and a storage section; 

FIG. 5 Is a timing chart for explaining the timings of 
sampling an input signal and storing sampled signals into a 
storage section; 

FIGS. 6(A) through 6(D) are explanatory diagrams for 
explaining how to compute an average voice power, an average 
noise power and an average S/N value; 

FIG. 7 is an explanatory diagram showing the structure of 
a speech condition table; 

FIG. 8 is an explanatory diagram showing the structure of 
a noise selection table; 

FIG. 9 is a flowchart for explaining the operation of the 
speech recognition system according to this embodiment; 

FIG. 10 is a flowchart for further explaining the 
operation of the speech recognition system according to this 
embodiment ; and 

FIG. 11 is a block diagram illustrating the structure of a 
modification of the speech recognition system according to this 
embodiment . 

[Explanation of Reference Numeral] 

1 „. multiplexer 

2 ... A/D Converter 

3 ... demultiplexer 

4 ... Storage Section 

5 ... Speech Detector 

6 ... Data Analyzer 

6a ... Optimal- speech Determining Section 
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6b ... Noise Determining Section 

6c ... Average-S/N Computing Section 

6d ... Average-voice-power Computing Section 

6e ... Average -noise -power Computing Section 

6f ... Speech Condition Table 

6g ... Noise Condition Table 

7 ... Speech Recognizer 

8 ... Controller 

9, TKi~TKn ... Speech Switch 

Ml ~Mn ... Microphone 

MEi ~MEn ... Memory Areas 

MFi ~MFn ... Frame Areas 

ASi~ASn, AWi~AWn... Analog Switches 
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[Document Name] Abstract 

[Abstract] 

[Purpose] 

It is a purpose to provide a speech recognition system 
with a high utility, 
[solving Means] 

A plurality of microphones ~M„ are provided, and input 
signals -"^S^ outputted from the microphones are 
converted to an input data , which are recorded in each 

memory MEi ~MEi^ area of a storage section 4 in a unit of 
predetermined frame. Each average S/N value and average voice 
power of input data in a unit of each frame are 

calculated. When the average S/N value and average voice power 
of input data are greater than the predetermined threshold 
value, the input data is determined as a voice data which is 
suitable for voice recognition. Also, when the average voice 
power and average noise power of input data becomes minimum, 
the input data is determined as a noise data. The voice 
recognition is carried out based on these voice data and noise 
data. 

[selected Drawing] Fig.l 



